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There is a trend toward the use of predictive systems in communications networks. At the 
systems and network management level predictive capabilities are focused on anticipating 
network faults and performance degradation. Simultaneously, mobile communication net- 
works are being developed with predictive location and tracking mechanisms. The interac- 
tions and synergies between these systems present a new set of problems. A new predictive 
network management framework is developed and examined. The interaction between a 
predictive mobile network and the proposed network management system is discussed. The 
Rapidly Deployable Radio Network is used as a specific example to illustrate these interac- 
tions]^ 

Keywords: Prediction Mobile Network Management Time Warp Virtual Network Con- 
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1 INTRODUCTION 

Recently proposed mobile networking architectures and protocols involve predictive mobility 
management schemes. For example, an optimization to a Mobile IP-like protocol using IP- 
Multicast is described in Hand-offs are anticipated and data is multicast to nodes within 
the neighborhood of the predicted handoff. These nodes intelligently buffer the data so that no 
matter where the mobile host (MH) re-associates after a handoff no data will be lost. Another ex- 
ample H] proposes deploying mobile floating agents that decouple services and resources from 
the underlying network. These agents are pre-assigned and pre-connected to predicted user loca- 
tions. This paper focuses on the Rapidly Deployable Radio Networks Project ^] as an example 
of a specific predictive mobile network. The Virtual Network Configuration algorithm developed 
as part of RDRN uses a predictive mechanism for every phase of configuration, including location 
and handoff. 

Progress is being made in research involving predictive system and network management 
This paper develops a variation of the Virtual Network Configuration Algorithm as proposed 

*This paper is partially funded by ARPA contract number J-FBI-94-223 and Sprint under contract 
CK5007715. 
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for the RDRN for a predictive network management system. The predictive capabiHty of 
such a system can be used to help optimize its own operation by controUing the management of 
the polling rate. Finally, a discussion of how predictive mobile networks and predictive network 
management systems should interact is presented. 



2 A PREDICTIVE NETWORK MANAGEMENT SYSTEM 

Systems management means the management of heterogeneous subsystems of network devices, 
processing platforms, distributed applications, and other components found in communications 
and computing environments. Current system management relies on presenting a model to the 
user of the managed system that should accurately reflect the current state of the system and 
should ideally be capable of predicting the future health of the system. System management 
relies on a combination of asynchronously generated alerts and polling to determine the health of 
a system 

The management application presents state information such as link state, buffer fill, and 
packet loss to the user in the form of a model The model can be as simple as a passive 
display of nodes on a screen, or a more active model that allows displayed nodes to change 
color based on state changes, or react to user input by allowing the user to manipulate the nodes, 
causing values to be set on the managed entity. This model can be made even more active by 
enhancing it with predictive capability. This enables the management system to manage itself, 
for example, to optimize its polling rate. The two major management protocols. Simple Network 
Management Protocol (SNMP) [|| and Common Management Information Protocol (CMIP) [p^, 
allow the management station to poll a managed entity to determine its state. To accomplish real- 
time and predictive network management in an efficient manner, the model should be updated 
with real-time state information when it becomes available, while other parts of the model work 
ahead in time. Those objects working ahead of real-time can predict future operation so that 
system management parameters such as polling times and thresholds can be dynamically adjusted 
and problems can be anticipated. The model will not deviate too far from reality because those 
processes found to deviate beyond a certain threshold are rolled back, as explained in detail later. 
The processes' messages must obey the following rules for consistency in [ [TJ: 

Rulel 

If two events are scheduled for the same process, then the event with the smaller timestamp must 
be executed before the one with the larger timestamp. 



Rule 2 

If an event executed at a process results in the scheduling of another event at a different process, 
then the former must be executed before the latter 

To determine the characteristics and performance of this predictive network management al- 
gorithm, we will review the research on performance and modeling of other lookahead algorithms 
and Time Warp in particular. A comparison of the conservative Chandy-Misra approach and the 
optimistic Time Warp is presented in [Q. This is done using a typical queuing theory approach 
which assumes exponential service times. There have been several other detailed comparisons 
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between conservative and optimistic methods of simulation. These studies also make simplifying 
assumptions. In |p3|], it is shown that in a feed-forward network, the time of execution of a mes- 
sage will occur earlier in Virtual Time than its corresponding message in the synchronous parallel 
algorithm described in [pT]]. In [|l4||, it is shown that Time Warp can out-perform the conservative 
technique known as Chandy-Misra by a factor of P, P being the number of processors, but that 
no such model in which Chandy-Misra out-performs Time Warp by a factor the number of pro- 
cessors used exists. Past work has examined the performance of Time Warp by comparing it to 
conservative mechanisms [|l3]| or simulating the Time Warp mechanism itself [|l5|]. In this paper 
the focus is not only on analyzing and optimizing speed of execution but also using the algorithm 
to maintain network management prediction accuracy. 

One goal of this research is to minimize polling overhead in the management of large sys- 
tems JT^]. Instead of basing the polling rate on the characteristics of the data itself, the entity 
is emulated some time into the future to determine the characteristics of the data to be polled. 
Polling is still required with this predictive network management system to verify the accuracy of 
the emulation. 



3 RELATIONSHIP BETWEEN NETWORK MANAGEMENT AND PARALLEL DIS- 
CRETE EVENT SIMULATION 

Management information from standards-based managed entities must be mapped into this pre- 
dictive network management system. Network management systems rely upon standard mecha- 
nisms to obtain the state of their managed entities in near real-time. These mechanisms, SNMP [^] 
and CMIP for example, use both solicited and unsolicited methods. The unsolicited method 
uses messages sent from a managed entity to the manager These unsolicited messages are called 
traps or notifications; the former are not acknowledged while the latter are acknowledged. These 
messages are very similar to messages used in distributed simulation algorithms; they contain a 
timestamp and a value, they are sent to a particular destination, i.e. a management entity, and they 
are the result of an event which has occurred. 

Information requested by the management system from a particular managed entity is so- 
licited information. It also corresponds to messages in a distributed simulation. It provides a time 
and a value; however, not all such messages are equivalent to messages in distributed simulation. 
These messages provide the management station with the current state of the managed entity, 
even though no change of state may have occurred or multiple state changes may have occurred. 
The design of a management system that requests information concerning the state of its managed 
entities at the optimum time has always been a problem in network management. If management 
information is requested too frequently, bandwidth is wasted, if not requested frequently enough, 
critical state change information will be missed. 

We will assume for simplicity that each managed entity is represented in the predictive man- 
agement system by a Logical Process. It would greatly facilitate system management if vendors 
provide not only the standards based SNMP MIBs as they do now, but also standard simulation 
code that models entity or application behavior and can be plugged into the management system 
just as Management Information Bases are used today. Vendors have such simulation models of 
their managed devices readily available from product development 
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4 THE PREDICTIVE NETWORK MANAGEMENT SYSTEM ALGORITHM 

Terminology borrowed from previous distributed simulation algorithms has a slightly different 
meaning in this predictive network management system. In addition, new terminology must be 
introduced. Thus it is important that the terminology be precisely defined. 

The predictive network management system management system algorithm encapsulates 
Physical Process simulating managed network devices within an Logical Process. A Physical 
Process is nothing more than the executing process defined by the program code. An Logical 
Process consists of the Physical Process and additional data structures and instructions to main- 
tain message order and correct operation as the system executes ahead of real time. An Logical 
Process contains a Receive Queue, Send Queue, and State Queue. The Receive Queue maintains 
newly arriving messages in order by their Receive Time. The Send Queue maintains copies of 
previously sent messages in order of their send times. The state of the Logical Process is peri- 
odically saved in the State Queue. The Logical Process also contains its notion of time known 
as Local Virtual Time and a Tolerance (0) that is the allowable deviation between actual and 
predicted values of incoming messages. Also, the Current State of a Logical Process will be the 
current state of the Logical Process and its encapsulated Physical Process. The predictive net- 
work management system contains a notion of the complete system time known as Global Virtual 
Time and a sliding window known as the Lookahead time (A). Messages contain the Send Time, 
Receive Time, Anti-toggle, and the message contents. The Receive Time is the time this message 
should be received by the destination Logical Process. The Send Time is the time this message 
was sent by the originating Logical Process. The Anti-toggle field is the anti-toggle and is used 
for creating an anti-message to remove the effect of false messages as described later. A message 
will also contain a field for the current Real Time. This is used to differentiate a real message 
from a virtual message. 

A driving process is required to predict future events and inject them into the system. For 
example, in a mobile system such as the Rapidly Deploy able Radio Network |^], the Global 
Positioning System is used to provide each node with its current position. The Global Positioning 
System receiver process runs in real-time and inject future predicted location messages. In the 
predictive network management system, the driving process may be the number of expected users 
and their estimated bandwidth usage. The driving process(es) originate virtual messages via 
internal prediction. The remaining Physical Processes react to these messages as though they are 
real messages. A message which is generated and time-stamped with the current time will be 
called a real message. Messages which contain future event information and are time-stamped 
with a time greater than current time are called virtual messages. If a message arrives at a Logical 
Process out-of-order or with invalid information, it is called a false message. A false message 
causes an Logical Process to rollback. 

Rollback is a mechanism by which a Logical Process returns to a known correct state. The 
rollback occurs in three phases. In the first phase, the Logical Process state is restored to a time 
strictly earlier than the time stamp of the false message. In the second phase, anti-messages are 
sent to cancel the effects of invalid messages that had been generated before the arrival of the 
false message. An anti-message contains exactly the same contents as the original message with 
the exception of an anti-toggle bit that is now set. When the anti-message and original message 
meet, they are both annihilated. The final phase consists of executing the Logical Process forward 
in time from its rollback state to the time the false message arrived. No messages are canceled or 
sent between the time to which the Logical Process rolled back and the time of the false message. 
Because these messages are correct there is no need to cancel or re-send them. This increases 
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performance, and it reduces the number of messages causing roll-back. Note that another false 
message or anti-message may arrive before this final phase has completed without causing any 
problems. 



5 CHARACTERISTICS OF THE PREDICTIVE NETWORK MANAGEMENT SYSTEM 

There are two types of false messages generated in this predictive network management system; 
those produced by messages arriving in the past Local Virtual Time of an Logical Process and 
those produced because the Logical Process is generating results which do not match reality. If 
rollbacks occur for both reasons the question arises as to whether system will be stable. A stable 
predictive network management system is one in which rollbacks do not have a significant impact 
on system performance. A stable system is able to make reasonably accurate predictions far 
enough into the future to be useful. An unstable system will have its performance degraded by 
rollbacks to the point where it is not able to predict ahead of real-time. Initial results, shown later, 
indicate that predictive network management systems using the algorithm described in this paper 
can be stable. 

There are several parameters in this predictive network management system which must be 
determined. The first is how often the predictive network management system should check the 
Logical Process to verify that past results match reality. There are two conditions which cause 
Logical Processes in the system to have states which differ from the system being managed and to 
produce inaccurate predictions. The first is that the predictive model which comprises an Logical 
Process is most likely a simplification of an actual managed entity and thus cannot model the 
entity with perfect fidelity. The second condition occurs when events outside the scope of the 
model may occur which lead to inaccurate results. However, a benefit of this system is that it 
self-adjusts to both of these conditions. 

The optimum choice of verification query time, Tq^ery, is important because querying enti- 
ties is something the predictive management system should minimize while still guaranteeing that 
the accuracy is maintained within some predefined tolerance, Q. For example, the network man- 
agement station may predict user location as explained later If the physical layer attempts spatial 
reuse via antenna beamforming techniques as in the Rapidly Deployable Radio Network project, 
then there is an acceptable amount of error in the steering angle for the beam and thus the node 
location is allowed to be within a tolerance. The tolerances are set for each state variable sent 
from a Logical Process. State verification can be done in one of at least two ways. The Logical 
Process state can be compared with previously saved states as real time catches up to the saved 
state times or output message values can be compared with previously saved output messages in 
the send queue. In the prototype implemented for this predictive network management system 
state verification is done based on states saved in the state queue. This implies that all Logical 
Process states must be saved from the Logical Process LVT back to the current time. 

The amount of time into the future that the emulation will attempt to venture is another 
parameter which must be determined. This lookahead sliding window width. A, should be pre- 
configured based on the accuracy required; the farther ahead this predictive network management 
system attempts to predict past real time, the more risk that is assumed. 



S. Bush, V. Frost / Network Management of Predictive Mobile Networks 



6 



5.1 Tolerance and Accumulated Simulation Error 

In order to consider the impact that out-of-tolerance rollback will have on the predictive system, 
consider how simulation error occurs. A predictive management system Logical Process may 
deviate from the real object because either the Logical Process does not accurately represent the 
actual entity or because events outside the scope of the predictive network management system 
may effect the entities being managed. Ignore events outside the scope of the simulation for now 
and consider error from inaccurate simulation modeling only. 

Because of the possibility for prediction error, a method of determining the amount of error 
in a predicted result needs to be developed. A function of total accumulated error in a predicted 
result, AC{-), is described by the following Equations: 

N 
i=l 

N 

ACt{T)= limM J2CEip^{MEip^_,,tip^) (2) 

MEdp is the error introduced by the virtual message injected into the predictive system by 
the driving process. The error introduced by the output message produced by the computation of 
each Logical Process is represented by the computation error function, CE{-). The actual time 
taken by the n}^ Logical Process to calculate and output the next virtual message is tip^^. Note 
that the Logical Process topology may not necessarily be a feed-forward network as described 
by Equations |l] and ^; it may include a cycle. Note also that the right side of Equation ^ is the 
greatest lower bound of all sub-sequential limits of X^iLi ^^ipi {MEip. -^ , tip. ) as ^ tip. r. 

The driving process is indicated by Ipo. AC,, (n) is the total accumulated error in the virtual 
message output by the n*'' Logical Process from the driving process. ACtir) is the accumulated 
error in r actual time units from generation of the virtual message from the driving process. For 
example, if a prediction result is generated in the third Logical Process from the driving process, 
then the total accumulated error in the result is AC„(3). If 10 represents the number of time 
units after the initial message was generated from the driving process then ACt{10) would be the 
amount of total accumulated error in the result. 

5.2 Optimum Choice of Verification Query Times 

As previously stated, the prototype system performs the verification based on the states in the state 
queue. One method of choosing the verification query time is to query the managed entity based 
on the frequency of the data we are trying to monitor Assuming the simulated data is correct, 
query or sample in such a way as to perfectly reconstruct the data, e.g. based on the maximum 
frequency component of the monitored data. A possible drawback is that the actual data may be 
changing at a multiple of the predicted rate. The samples may appear to be accurate when they 
are invahd. 

5.3 Verification Tolerance 

The verification tolerance, 8, is the amount of difference allowed between the Logical Process 
state and the actual entity state. A large tolerance decreases the number of false messages and 
rollbacks, thus increasing performance and requires fewer management queries. Allowing a larger 
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probability of error between predicted and the actual values will cause rollbacks in each Logical 
Process at real times of tyfaii from the start of execution of each Logical Process. 

The error throughout the simulated system may be randomized in such a way that errors 
among Logical Processes cancel. However, if the simulation is composed of many of the same 
class of Logical Process, the errors may compound rather than cancel each other The tolerance of 
a particular Logical Process, 6;j,,^, will be reached in time = {lub r s.t. ACtir) > &ip„}- 

The verification query period (T) should be periodic with period less than or equal to tyfaii,-, in 
order to maintain accuracy within the tolerance. 

The accuracy of any predicted event must be quantified. This could be quantified as the 
probability of occurrence of a predicted event. The probability of occurrence will be a function of 
the verification tolerance, the time of last rollback due to verification error, the error between the 
simulation and actual entity, and the sliding lookahead window. Every Logical Process will be 
in exact alignment with its Physical Process as a result of a state verification query. This occurs 
every Tq^ery = tvfaii time units. 

5.4 Length of Lookahead Window 

The length of the lookahead window. A, should be as large as possible while maintaining the 
required accuracy. The total error is a function of the chain of messages which lead to the state 
in question. Thus, the farther ahead of real-time the predictive network management system ad- 
vances, tahead = GVT — tcurrent-time, the greater the number of messages before a verification 
query can be made and the greater the error. The maximum error is ACt (A). 

5.5 Simulation Time 

Since the verification query time is less than or equal to the current time, tcurrent-Ume, rollbacks 
due to the verification query will take the Logical Process back to the current time. Thus, Global 
Virtual Time as defined in [ p^ is no longer a lower bound on the simulation rollback time. The 
lower bound is now tcurrent-Ume- Global Virtual Time is still required in order to determine how 
far into the future the predictive network management system has gone. 

5. 6 Calibration Mode of Operation 

It may be helpful to run the predictive network management system in a mode such that error 
between the actual entities and the predictive network management system are measured. This 
error information can be used during the normal predictive mode in order to help set the above 
parameters. This has an effect similar to back-propagation in a neural network, i.e. the predictive 
network management system automatically adjusts parameters in response to output in order to 
become more accurate. This calibration mode could be part of normal operation. The error can be 
tracked simply by keeping track of the difference between the simulated messages and the result 
of verification queries. 



6 MODEL AND SIMULATION 

The algorithm described in this paper has been implemented and analyzed in [ p^ that describes a 
predictive mobile network. This paper extends the algorithm to network management. An initial 
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test of this algorithm in a network management environment has been performed in a simula- 
tion of a predictive management system implemented with Maisie [|l9||. Its suitability has been 
demonstrated in the RDRN network management and control design and development and in pO| ] 
to develop a mobile wireless network parallel simulation environment. The parallel simulation 
environment shows a speedup over the currently used commercial sequential simulation pack- 
ages. The environment and a set of modules which have been developed for mobile network 
simulation are described in [|20[]. Maisie uses a language which has been influenced by a classic 
work describing the characteristics of a parallel programming language structure [pT|]. 

Since every Maisie entity has a built-in input queue, each Logical Process is comprised of 
three additional Maisie entities: 

• An entity which represents the Physical Process 

• An entity for the Logical Process state queue 

• An entity for the Logical Process output message queue 

There is also a gvt entity for the calculation of Global Virtual Time. All three of the above 
entities work together to implement Virtual Time as described in [[l7|]. The first entity above, 
representing the Physical Process, contains a delay mechanism in order to implement the sliding 
lookahead window. The gvt process should notify all processes to cease forward simulation when 
Global Virtual Time reaches the end of the window. However, in this version of the predictive 
management system, each Logical Process simply compares its Local Virtual Time to the current 
time and holds processing until current time is back within the lookahead sliding window. 

Determination of Global Virtual Time should be done as defined by [^. This algorithm 
allows Global Virtual Time to be determined in a message-passing environment as opposed to the 
easier case of a shared memory environment. It also allows normal processing to continue during 
the Global Virtual Time determination phase. However, in this implementation each output mes- 
sage is sent to the gvt entity as well as to its proper destination. In addition, the gvt entity checks 
aU Logical Processes for their current Local Virtual Time and chooses the minimum message 
send time and Local Virtual Time as the current Global Virtual Time. The gvt entity is allowed to 
execute in parallel with other entities in this simulation, thus it may not always be perfectly accu- 
rate. This is because messages may be in transit when the Local Virtual Time request poll takes 
place, and because Logical Processes are changing while the Global Virtual Time computation is 
taking place. However, the results are close enough for the purpose of these experiments. 

6.1 Verification Query Rollback Versus Causality Rollback 

Verification query rollbacks are the most critical part of the predictive management system. They 
are handled in a slightly different fashion from causality failure rollbacks. A state verification 
failure causes the Logical Process state to be corrected at the time of the state verification that 
failed. The state, S'„, has been obtained from the actual device from the verification query at time 
tv The Logical Process rolls back to exactly with state. Si,. States greater than t„ are removed 
from the state queue. Anti-messages are sent from the output message queue for all messages 
greater than i„. The Logical Process continues forward execution from this point. Note that this 
implies that the message and state queues cannot be purged of elements that are older than the 
Global Virtual Time. Only elements which are older than real time can be purged. 
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6.2 The Prototype System Simulation 

A simple network management system was simulated to test the concept of the predictive man- 
agement protocol just described. Note that none of the previous assumptions are made in the 
simulation. The purpose of this simulation is to determine if the concept is feasible. A key ques- 
tion this simulation attempts to answer is whether the overhead/performance ratio results in a 
useful system. A small closed queuing network with First Come First Serve servers represents 
the actual system. Figure |l] shows the real system to be managed and the predictive manage- 
ment model. In this initial feasibility study, the managed system and the predictive management 
model are both modeled with Maisie. The verification query between the real system and the 
management model are explicitly illustrated in Figure [l|. 



Verification Query 




Verification Query Predicfive Management Model 



Figure 1 : Initial Feasibility Network Model 

The system consists of three switch-like entities, each switch contains a single queue and 
switches consisting of 10 exponentially distributed servers that sequentially service each packet. 
A mean service time of 10 time units is assumed. The servers represent the link rate. The packet 
is then forwarded with equal probability to another switch, including the originating switch. Each 
switch is a driving process; the switches forward real and virtual messages. The cumulative 
number of packets which have entered each switch's queue is the state. This is similar to Simple 
Network Management Protocol Q statistics monitored by Simple Network Management Protocol 
Counters, for example, the iflnOctets counter in MIB-II interfaces [^. 

Both real and virtual messages contain the time service ends and a count of the number 
of times a packet has entered a switch. An initial message enters each queue upon startup to 
associate a queue with its switch. This is the purpose of the idmsg that enters the queues in 
Figure |l} The predictive system parameters are more compactly identified as a triple consisting 
of Lookahead Window Size (seconds). Tolerance (counter value), and Verification Query Period 
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(seconds) in the form (A, 0,T). The effect of these parameters are examined on the system 
of switches previously described. The simulation was run with the following triples: (5, 10, 5), 
(5, 10, 1), (5, 3, 5), (400, 5, 5). The graphs that follow show the results for each triple. 

The first run parameters were (5, 10, 5). There were no state verification rollbacks although 
there were some causality induced rollbacks as shown in Figure ^ Global Virtual Time increased 
almost instantaneously versus real time; at times the next event far exceeded the look-ahead win- 
dow. This is the reason for the nearly vertical jumps in the Global Virtual Time as a function of 
real-time as shown in Figure ^. The state graph for this run is shown in Figure ||. 



GVT versus Real Time 
600 I 1 1 1 1 
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J 1 1 1 1 1 1 1 1 
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Figure 2: Rollbacks Due to State Verification Failure (5, 10, 5) 

In the initial implementation, state verification was performed in the Logical Process imme- 
diately after each new message was received. However, the probability that an Logical Process 
had saved a future state, while processing at its Local Virtual Time, with the same state save time 
as the time at which a real message arrived was low. Thus, there was frequently nothing to com- 
pare the current state with in order to perform the state verification. However, it was observed 
that the predictive system was simulating up to the lookahead window very quickly and spending 
most of its time holding, during which time it was doing nothing. The implementation was mod- 
ified so that each entity would perform state verification during its hold time. This design change 
better utilized the processors and resulted in more accurate alignment between actual and logical 
processes. 

The results for the (5, 10, 1) run were similar, except that the predictive and actual system 
comparisons were more frequent because the state verification period had been changed from 
once every 5 seconds to once every second. Error was measured as the difference in the predicted 
Logical Process state versus the actual system state. This run showed errors that were greater than 
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Figure 3: State (5, 10, 5) 



those in the first run, great enough to cause state verification rollbacks. The error levels for both 
runs are shown in Figures ^ and ^. The state graph for this run is shown in Figure ^ 

The next run used (5, 3, 5) parameters. Here we see many more state verification failure 
rollbacks as shown in Figure ^ This is expected since the tolerance has been reduced from 10 
to 3. The cluster of causality rollbacks near the state verification rollbacks was expected. These 
clusters of causality rollbacks do not appear to significantly reduce the feasibility of the system. 
The real-time versus Global Virtual Time plot as shown in Figure ^ shows much larger jumps as 
the Logical Processes were held back due to rollbacks. The entities had a larger variance in their 
hold times than the (5,10,5) run. The state graph for this run is shown in Figure ^. 

A (400, 5, 5) run showed the Global Virtual Time jump quickly to 400 and then gradually 
increase as the sliding lookahead window maintained a 400 time unit lead as shown in Figure ^ 
The Logical Process hold times were shorter than an any previous run. The state graph for this 
run is shown in Figure |l0[ 

This set of results is interesting because it shows the system to be stable with the introduction 
of state verification rollbacks. The overhead introduced by these rollbacks did not greatly impact 
the performance, because as previously shown in the Global Virtual Time versus time graphs in 
Figures ^ ^ and ^ the system was always able to predict up to its lookahead time very quickly. 
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Figure 5: Amount of Error (5, 10, 1) 



S. Bush, V. Frost / Network Management of Predictive Mobile Networks 



Emulated State vs Real State 



45 



40 




Predicted State 
Actual State 



100 



200 



300 400 
Real Time and GVT (sees) 



500 



600 



700 



Figure 6: State (5, 10, 1) 
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Figure 7: Rollbacks Due to State Verification Failure (5, 3, 5) 



S. Bush, V. Frost / Network Management of Predictive Mobile Networks 



Emulated State vs Real State 



40 



35 



30 



25 



20 



15 






Predicted State 
Actual State 



100 



200 300 400 

Real Time and GVT (sees) 



500 



600 



700 



600 



500 



I 400 
CO 



— 300 



200 



100 



100 



Figure 8: State (5, 3, 5) 
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Figure 9: Rollbacks Due to State Verification Failure (400, 5, 5) 
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Emulated State vs Real State 
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Figure 10: State (400, 5, 5) 



7 OPTIMIZING MANAGEMENT POLLING WITH THE PREDICTIVE MANAGER 



Since the predictive network management system provides a good approximation of the future 
behavior of the data to be managed as shown in the Global Virtual Time versus real time values 
of state in Figures ^ ^, ^ and |l^ the verification query period can be automatically determined 
as a function of the look-ahead window and tolerance, with the goal of minimizing the frequency 
of verification queries thus solving the polling problem in network management. 

In most standards based approaches, network management stations are sampling counters in 
managed entities that simply increment in value until they roll over A management station which 
is simply plotting data will have some fixed polling interval and record the absolute value of the 
difference in value of the counter. Such a graph is not a perfectly accurate representation of the 
data, it is merely a statement that sometime within a polling interval the counter has monotonically 
increased by some amount. Spikes in this data, which may be very important to the current state 
of the system, may not be noticed if the polling interval is long enough such that a spike followed 
by low data values averages out to a normal or low value. One of the goals of a predictive 
management system is to determine the minimum polling interval required to accurately represent 
the data. 

From the information provided by the predictive management system, a polling interval 
which provides the desired degree of accuracy can be determined and dynamically adjusted; how- 
ever, the cost must be determined. An upper limit on the number of systems that can be polled is 
N < ^ where N is the number of devices capable of being polled, T is the polling interval, and 
A is the time required for a single poll. Thus although the data accuracy will be constrained by 
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this upper limit, taking advantage of characteristics of the data to be monitored can help distribute 
the polling intervals efficiently within this constraint. Assume that A is a calculated and fixed 
value, as is N. Thus this is a lower bound on the value of T > A7V. 

The overhead bandwidth required for use by the management system to perform polling is 
shown in Equation |[ The packet size will vary depending upon whether it is an SNMP or CMIP 
packet and the MIB object(s) being polled. The number of packets varies with the amount of 
management data requested. Let P be the number of packets, S be the bits/packet, N be the 
number of devices polled, and T be the polling period. Bw is the total available bandwidth and 
Bwoh is the overhead bandwidth of the management traffic. 

It may be desirable to limit the bandwidth used for polhng system management data to be no 
more than a certain percentage of total bandwidth. Thus the optimum polling interval will use the 
least amount of bandwidth while also maintaining the least amount of variance due to error in the 
data signal. AU the required information to maintain the cost versus accuracy at a desired level is 
provided by the predictive network management system. 



8 INTERACTION BETWEEN A PREDICTIVE MANAGEMENT SYSTEM AND A PRE- 
DICTIVE MOBILE NETWORK 

There is an interesting interaction between the predictive management system and the predictive 
mobile network. A predictive mobile network such as the Rapidly Deployable Radio Network 
[l^] will have cached results in advance of use for many configuration parameters. These 
results should be part of the Management Information Base for the mobile network and includes 
the predicted time of the event requiring the result, the value of the result, and the probability 
that the result will be within tolerance. Thus there will be a triple associated with each predicted 
event: (time, value, probability). Network management protocols, e.g. SNMP and CMIP [p^, 
include the time as part of the Protocol Data Unit, however this time indicates the real time the 
poll occurred. 

A predictive management system could simply use Logical Processes to represent the pre- 
dictive mobile processes as previously described, however, this is redundant since the mobile 
network itself has predicted events in advance as part of its own management and control system. 
Therefore, managing a predictive mobile network with a predictive network management system 
provides an interesting problem in trying to get the maximum benefit from both of these predictive 
systems. 

Combining the two predictive systems in a low level manner, e.g. allowing the Logical 
Processes to exchange messages with each other, raises questions about synchronization between 
the mobile network and the management station. However, the predicted mobile network results 
can be used as additional information to refine the management system results. The management 
system will have computed (time, value, probability) triples for each predicted result as well. 
The final result by the management system would then be an average of the times and values 
weighted by their respective probabilities. An additional weight may be added given the quality 
of either system. For example the network management system might be weighted higher because 



S. Bush, V. Frost / Network Management of Predictive Mobile Networks 



17 



it has more knowledge about the entire network. Alternatively, the mobile network system may 
weighted higher because the mobile system may have better predictive capability for the detailed 
events concerning handoff. Thus the two systems do not directly interact with each other, but 
the final result is a combination of the results from both predictive systems. A more complex 
method of combining results from these two systems would involve a causal network such as the 
one described in [Q . 



9 CONCLUSION 

Network management systems capable not only of passive monitoring but also of active predic- 
tion capability are undergoing research and development. Work on prediction mechanisms for 
mobile communication networks is also underway. A method used by standards-based network 
management systems to cope with these two developments has been proposed in this paper. 

A predictive network management algorithm and the characteristics of a predictive network 
management system have been presented. The predictive capability of the network management 
system is used to solve the polling rate problem for network management. The Rapidly De- 
ployable Radio Network [ p^ is presented as an example of a predictive mobile communications 
network. Finally, interaction between the predictive capabilities of the network management and 
mobile network systems has been discussed. 
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